{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Learning PyTorch\n",
    "## What is PyTorch?\n",
    "PyTorch is a Python-based scientific computing package targeted at two sets of audiences:\n",
    "1. A replacement for numpy to use the power of GPUs,\n",
    "2. A deep learning research platform that provides maximum flexibility and speed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started:\n",
    "### Tensors:\n",
    "Tensors are similar to numpys' ndarray, with the addition being that Tensors can also be used on a GPU to accelerate computing.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Construct a $5\\times3$ matrix, uninitialized:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 9.7746e-35  4.5706e-41  9.7746e-35\n",
      " 4.5706e-41 -1.3938e+14  4.5705e-41\n",
      "-1.3938e+14  4.5705e-41 -1.3938e+14\n",
      " 4.5705e-41 -1.3938e+14  4.5705e-41\n",
      "-1.3938e+14  4.5705e-41 -1.3938e+14\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "x = torch.Tensor(5, 3)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Construct a randomly initialized matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.3519  0.3081  0.5241\n",
      " 0.8461  0.5484  0.8138\n",
      " 0.2109  0.7964  0.6129\n",
      " 0.2227  0.6007  0.2532\n",
      " 0.2021  0.6260  0.4046\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "x = torch.rand(5, 3)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get its size:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([5, 3])\n"
     ]
    }
   ],
   "source": [
    "print(x.size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Operations:\n",
    "There are multiple syntaxes for operations. Let's see addition as an example...\n",
    "* Addition: syntax 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.4318  0.8357  0.8879\n",
      " 1.8070  1.5154  1.4950\n",
      " 0.4707  1.7179  1.3634\n",
      " 0.4693  1.5691  0.4152\n",
      " 0.2309  0.9294  0.4910\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "y = torch.rand(5, 3)\n",
    "print(x + y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Addition: syntax 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.4318  0.8357  0.8879\n",
      " 1.8070  1.5154  1.4950\n",
      " 0.4707  1.7179  1.3634\n",
      " 0.4693  1.5691  0.4152\n",
      " 0.2309  0.9294  0.4910\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(torch.add(x, y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Addition: giving an output tensor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.4318  0.8357  0.8879\n",
      " 1.8070  1.5154  1.4950\n",
      " 0.4707  1.7179  1.3634\n",
      " 0.4693  1.5691  0.4152\n",
      " 0.2309  0.9294  0.4910\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "result = torch.Tensor(5, 3)\n",
    "torch.add(x, y, out=result)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Addition: in-place"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.4318  0.8357  0.8879\n",
      " 1.8070  1.5154  1.4950\n",
      " 0.4707  1.7179  1.3634\n",
      " 0.4693  1.5691  0.4152\n",
      " 0.2309  0.9294  0.4910\n",
      "[torch.FloatTensor of size 5x3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# adds x to y\n",
    "y.add_(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use standard numpy-like indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 0.3081\n",
      " 0.5484\n",
      " 0.7964\n",
      " 0.6007\n",
      " 0.6260\n",
      "[torch.FloatTensor of size 5]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(x[:, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numpy Bridge:\n",
    "You can convert a torch Tensor to a numpy array and vice versa.\n",
    "The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.\n",
    "### Converting torch Tensor to numpy Array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 1\n",
      " 1\n",
      " 1\n",
      " 1\n",
      " 1\n",
      "[torch.FloatTensor of size 5]\n",
      "\n",
      "[ 1.  1.  1.  1.  1.]\n"
     ]
    }
   ],
   "source": [
    "a = torch.ones(5)\n",
    "print(a)\n",
    "b = a.numpy()\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See how to numpy array changed in value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " 2\n",
      " 2\n",
      " 2\n",
      " 2\n",
      " 2\n",
      "[torch.FloatTensor of size 5]\n",
      "\n",
      "[ 2.  2.  2.  2.  2.]\n"
     ]
    }
   ],
   "source": [
    "a.add_(1)\n",
    "print(a)\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Converting numpy Array to torch Tensor\n",
    "See how changing the numpy array changed the torch Tensor automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 2.  2.  2.  2.  2.]\n",
      "\n",
      " 2\n",
      " 2\n",
      " 2\n",
      " 2\n",
      " 2\n",
      "[torch.DoubleTensor of size 5]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "a = np.ones(5)\n",
    "b = torch.from_numpy(a)\n",
    "np.add(a, 1, out=a)\n",
    "print(a)\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CUDA Tensors\n",
    "Tensors can be moved onto GPU using the .cuda function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let us run this cell only if CUDA is available\n",
    "if torch.cuda.is_available():\n",
    "    x = x.cuda()\n",
    "    y = y.cuda()\n",
    "    print(x + y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Autograd: automatic differentiation\n",
    "Central to all neural networks in PyTorch is the **autograd** package. The **autograd** package provides automatic differentiation for all operations on Tensors.\n",
    "\n",
    "## Variable\n",
    "**autograd.Variable** is the central class of the package. It wraps a Tensor, and supports nearly all operations defined on it. Once you finish your computation you can call **.backward()** and have all the gradients computed automatically.\n",
    "You can access the raw tensor through the **.data** attribute, while the gradient w.r.t. this variable is accumulated into **.grad**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch.autograd import Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      " 1  1\n",
      " 1  1\n",
      "[torch.FloatTensor of size 2x2]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "x = Variable(torch.ones(2, 2), requires_grad=True)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do an operation on variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      " 3  3\n",
      " 3  3\n",
      "[torch.FloatTensor of size 2x2]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "y = x + 2\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do more operations on y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      " 27  27\n",
      " 27  27\n",
      "[torch.FloatTensor of size 2x2]\n",
      " Variable containing:\n",
      " 27\n",
      "[torch.FloatTensor of size 1]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "z = y * y * 3\n",
    "out = z.mean()\n",
    "print(z, out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradients\n",
    "Backprop using **out.backward()** is equivalent to doing **out.backward(torch.Tensor([1.0]))**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "out.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "print gradients d(out)/dx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      " 4.5000  4.5000\n",
      " 4.5000  4.5000\n",
      "[torch.FloatTensor of size 2x2]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      "  638.1205\n",
      "-1033.7494\n",
      "  851.3856\n",
      "[torch.FloatTensor of size 3]\n",
      "\n",
      "Variable containing:\n",
      "  102.4000\n",
      " 1024.0000\n",
      "    0.1024\n",
      "[torch.FloatTensor of size 3]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "x = torch.randn(3)\n",
    "x = Variable(x, requires_grad=True)\n",
    "\n",
    "y = x * 2\n",
    "while y.data.norm() < 1000:\n",
    "    y *= 2\n",
    "print(y)\n",
    "\n",
    "gradients = torch.FloatTensor([0.1, 1.0, 0.0001])\n",
    "y.backward(gradients)\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Neural Networks\n",
    "Neural networks can be constructed using the **torch.nn** package.\n",
    "**nn** deponds on **autograd** to define models and differentiate them. An **nn.Module** contains layers, and a method **forward(input)** that returns the **output**.\n",
    "A typical training procedure for a neural network is as follows:\n",
    "* Define the neural network that has some learnable parameters (or weights)\n",
    "* Iterate over a dataset of inputs\n",
    "* Process the input through the network\n",
    "* Compute the loss (how far is the output from being correct)\n",
    "* Propagate gradients back into the network's parameters\n",
    "* Update the weights of the network, typically using a simple udate rule: $weight = weight - learning_rate * gradient$\n",
    "\n",
    "### Define the network\n",
    "Let's define this network:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Net (\n",
      "  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
      "  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
      "  (fc1): Linear (400 -> 120)\n",
      "  (fc2): Linear (120 -> 84)\n",
      "  (fc3): Linear (84 -> 10)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch.autograd import Variable\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "\n",
    "class Net(nn.Module):\n",
    "\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        # 1 input image channel, 6 output channels, 5x5 square convolution\n",
    "        # kernel\n",
    "        self.conv1 = nn.Conv2d(1, 6, 5)\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
    "        # an affine operation: y = Wx + b\n",
    "        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
    "        self.fc2 = nn.Linear(120, 84)\n",
    "        self.fc3 = nn.Linear(84, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        # Max pooling over a (2, 2) window\n",
    "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
    "        # If the size is a square you can only specify a single number\n",
    "        x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
    "        x = x.view(-1, self.num_flat_features(x))\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "        return x\n",
    "\n",
    "    def num_flat_features(self, x):\n",
    "        size = x.size()[1:]  # all dimensions except the batch dimension\n",
    "        num_features = 1\n",
    "        for s in size:\n",
    "            num_features *= s\n",
    "        return num_features\n",
    "\n",
    "\n",
    "net = Net()\n",
    "print(net)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You just have to define the **foreward** function, and the **backward** function (where gradients are computed) is automatically defined for you using **autograd**. You can use any of the Tensor operations in the **forward** function.\n",
    "\n",
    "The learnable parameters of a model are returned by **net.parameters()**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "torch.Size([6, 1, 5, 5])\n"
     ]
    }
   ],
   "source": [
    "params = list(net.parameters())\n",
    "print(len(params))\n",
    "print(params[0].size())  # conv1's weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The input to the forward is an **autograd.Variable**, and so is the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      "-0.0353 -0.1519 -0.0046 -0.1127 -0.1318 -0.1958 -0.0085  0.0600  0.0778 -0.0067\n",
      "[torch.FloatTensor of size 1x10]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "input = Variable(torch.randn(1, 1, 32, 32))\n",
    "out = net(input)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Zero the gradient buffers of all parameters and backprops with random gradients:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.zero_grad()\n",
    "out.backward(torch.randn(1, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Loss Function\n",
    "The loss function takes the (output, target) pair of inputs and computes a value that estimates how far away the output is from the target.\n",
    "There are several different **loss functions** under the nn package. A simple loss is: **nn.MSELoss** which computes the mean-squared error between the input and the target."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable containing:\n",
      " 38.8268\n",
      "[torch.FloatTensor of size 1]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "output = net(input)\n",
    "target = Variable(torch.arange(1, 11))\n",
    "criterion = nn.MSELoss()\n",
    "\n",
    "loss = criterion(output, target)\n",
    "print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Backprop\n",
    "To backpropagate the error all we have to do is **loss.backward()**. Need to clear the existing gradients though, else gradients will be accumulated to existing gradients.\n",
    "\n",
    "Call **loss.backward()** and look at conv1's bias gradients before and after the backward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv1.bias.grad before backward\n",
      "Variable containing:\n",
      " 0\n",
      " 0\n",
      " 0\n",
      " 0\n",
      " 0\n",
      " 0\n",
      "[torch.FloatTensor of size 6]\n",
      "\n",
      "conv1.bias.grad after backward\n",
      "Variable containing:\n",
      " 0.0992\n",
      "-0.0004\n",
      " 0.0713\n",
      " 0.0228\n",
      "-0.1572\n",
      " 0.1141\n",
      "[torch.FloatTensor of size 6]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "net.zero_grad()  # zeros gradient buffers of all parameters\n",
    "print(\"conv1.bias.grad before backward\")\n",
    "print(net.conv1.bias.grad)\n",
    "\n",
    "loss.backward()\n",
    "\n",
    "print(\"conv1.bias.grad after backward\")\n",
    "print(net.conv1.bias.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Update the weights\n",
    "The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):\n",
    "$weight = weight - learning_rate * gradient$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "learning_rate = 0.01\n",
    "for f in net.parameters():\n",
    "    f.data.sub_(f.grad.data * learning_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, as you use neural networks, you want to use various different update rules as SGC, Adam, RSMProp, etc. To enable this, build a small package: **torch.optim** that implements all these methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "\n",
    "# create your optimizer\n",
    "optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
    "\n",
    "# in your training loop:\n",
    "optimizer.zero_grad()  # zero the gradient buffers\n",
    "output = net(input)\n",
    "loss = criterion(output, target)\n",
    "loss.backward()\n",
    "optimizer.step()  # does the update"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
